Risk-prediction nomogram for congenital heart disease in offspring of Chinese pregnant women

Background The identification and assessment of environmental risks are crucial for the primary prevention of congenital heart disease (CHD). We were aimed to establish a nomogram model for CHD in the offspring of pregnant women and validate it using a large CHD database in Northwest China. Methods A survey was conducted among 29,204 women with infants born between 2010 and 2013 in Shaanxi province, Northwest China. Participants were randomly assigned to the training set and to the validation set at a ratio of 7:3. The importance of predictive variables was assessed using random forest. A multivariate logistic regression model was used to construct the nomogram for the prediction of CHD. Results Multivariate analyses revealed that the gravidity, preterm birth history, family history of birth defects, infection, taking medicine, tobacco exposure, pesticide exposure and singleton/twin pregnancy were significant predictive risk factors for CHD in the offspring of pregnant women. The area under the receiver operating characteristic curve for the prediction model was 0.716 (95% CI: 0.671, 0.760) in the training set and 0.714 (95% CI: 0.630, 0.798) in the validation set, indicating moderate discrimination. The prediction model exhibited good calibration (Hosmer-Lemeshow χ2 = 1.529, P = 0.910). Conclusions We developed and validated a predictive nomogram for CHD in offspring of Chinese pregnant women, facilitating the early prenatal assessment of the risk of CHD and aiding in health education.


Introduction
Congenital heart disease (CHD) is a malformation caused by abnormal cardiovascular development in the fetus, potentially leading to miscarriage, stillbirth, and infant mortality, significantly impacting quality of life and posing a substantial disease burden [1].CHD is the most prevalent type of congenital malformation, comprising approximately 30% of all birth defects globally [2,3].In China, the incidence of CHD is estimated to be between 8 and 10 per thousand live births based on birth defect surveillance data, indicating approximately 150,000 children born with CHD each year, with a mortality rate of 30% occurring within the first year of life [4].Consequently, CHD represents a significant public health challenge affecting both maternal and child health.
Several predictive models for CHD have been developed.For instance, Huixia Li et al. constructed an artificial neural network model based on 15 predictors for CHD risk in a case-control study [19].However, only 119 cases and 239 controls were included in their model.Besides, Yun Liang used the Hosmer-Lemeshow test and receiver operating characteristic (ROC) curve analysis to examine maternal risk factors for offspring CHD during pregnancy [20].The sample sizes in these studies were relatively small, and no predictive tool for CHD was developed, limiting the utility of these models for clinical use.In this study, based on a comprehensive CHD database from Northwest China, we developed and validated a nomogram to predict the risk of CHD in the offspring of pregnant women.

Study design and participants
A survey on CHD was conducted among the population of Shaanxi province in 2013.The survey included pregnant women from the years 2010 to 2013 and covered nine cities within Shaanxi province.A standardized questionnaire was developed by the Xi'an Jiaotong University Health Science Center for this purpose.Trained field staff from the same institution conducted face-to-face interviews.The expected sample size was approximately 32,400 participants, and ultimately, 30,027 women completed the survey, resulting in a response rate of 92.68%.
From the survey, we collected data on maternal sociodemographic characteristics and periconceptional risk exposures.Additionally, information regarding the occurrence of CHD between enrollment and delivery, along with data on birth defects diagnosed at local hospitals, was also obtained.We excluded 823 individuals due to missing covariate information or unknown pregnancy outcomes, resulting in a total enrollment of 29,204 individuals for this study.
There were multi-discipline experts participating in the diagnosis of CHD in this study, including senior medical technicians from the departments of obstetrics and gynecology, ultrasound, and pediatric cardiac surgery at the first affiliated hospital of Xi'an Jiaotong University.To ensure the consistency and the accuracy of the diagnosis, all cases of CHD were diagnosed by these experts based on the International Statistical Classification of Diseases and Related Health Problems (ICD-10) coding system.For children identified with cardiovascular anomalies, their medical records were collected, and they underwent comprehensive ultrasound examinations free of charge at the first affiliated hospital of Xi'an Jiaotong University for the final diagnosis.

Definitions of main variables
In this study, the primary outcome was the presence of CHD in the offspring.CHD was defined as a structural or functional abnormality of the heart that developed during fetal development and is present at birth, including conditions such as atrial septal defect, ventricular septal defect, patent ductus arteriosus, atrioventricular septal defect, tetralogy of Fallot, transposition of the great arteries, and hypoplastic left heart syndrome.For preterm infants who initially presented with atrial septal defects, persistent foramen ovale, and patent ductus arteriosus, follow-up assessments were conducted until 18 months of age to determine if these anomalies persisted, thereby allowing for the diagnosis of CHD.
The periconceptional period encompassed the three months before conception and the early stages of pregnancy (up to 12 weeks).Family history of birth defects was defined using congenital disabilities in immediate relatives."Infection" referred to individuals who experienced at least one episode of flu or cold, or had a mild infection during early pregnancy."Fever" was characterized by individuals who had a temperature exceeding 38 °C at least once during early pregnancy."Taking medicine" included the use of any drugs such as antibiotics, anticancer agents, antidepressants, hormones, and other pharmaceuticals during early pregnancy."Alcohol consumption" denoted women who consumed alcohol at least once during the early pregnancy period."Tobacco exposure" was defined as women who smoked one cigarette per week for three consecutive months or were passively exposed to smoke for 15 min daily for one month during the periconceptional period."Pesticide exposure" referred to pregnant women exposed to herbicides, fungicides, rodenticides, or insecticides during the periconceptional period."Industrial exposure" included pregnant women residing within one kilometer of mines, fertilizer factories, cement factories, paper mills, pesticide factories, or power plants during pregnancy."Folic acid supplementation" was defined as the daily consumption of 400 µg of folic acid for at least three consecutive months during the periconceptional period.

Statistical analysis
70% of the subjects were randomly assigned to the training set for generating the nomogram, while the remaining 30% were allocated to the validation set for external verification.Both the training and validation sets were applied for the construction of the nomogram and external verification.Categorical variables were described using frequency and percentage, and differences among groups were assessed using the χ 2 test.To identify factors associated with CHD, univariate logistic regression models were applied to the training set.Significant variables from the univariate analysis were included in a multivariate logistic regression model, with final model selected using a stepwise forward method.Prior to multivariate analysis, collinearity was assessed using contingency coefficients.Based on results from the multivariate logistic regression, a nomogram was developed to predict the risk of CHD using the selected variables.To evaluate the model's discriminative performance, the AUC was determined using the C index.The calibration performance was evaluated using the chi-square of Hosmer-Lemeshow test.Additionally, the importance of predictive variables in random forest model was ranked using the mean decrease Gini method.Data analysis was conducted using R software, version 3.5.1.A two-tailed p < 0.05 was considered statistically significant.

Patient demographics
A total of 29,204 women participated in the study, with 20,412 women assigned to the training group and the remaining 8,792 women assigned to the validation group.Table 1 presents the baseline characteristics of all enrolled subjects.The χ 2 test revealed no significant differences in the characteristics of the participants between the training and validation groups.

Nomogram development
Table 2 summarizes the results of univariate and multivariate analyses conducted on the training set for CHD and potential predictive risk factors.Results showed that there were eight significant predictive risk factors for CHD, including the gravidity, preterm birth history, family history of birth defects, infection, taking medicine, tobacco exposure, pesticide exposure and singleton/twin pregnancy.
Based on the random forest algorithm, the importance of predictor variables was ranked (Fig. 1).A higher mean decrease in Gini coefficient indicated greater variable importance.Taking medicine emerged as the most significant predictive risk factor, followed by infection, tobacco exposure, gravidity, pesticide exposure, preterm birth history, singleton/twin pregnancy, and family history of birth defects.
According to the multivariate logistic regression model with eight independent predictive risk factors, we developed an individualized nomogram model for the prediction of CHD (Fig. 2).The nomogram assigned a specific score to each predictive risk factor, and the total score was calculated by summing these individual scores.The predicted risk of CHD was then determined based on the corresponding probability associated with the total score.For example, a pregnant woman who scored 0 points when she was in her first pregnancy with no preterm birth history, no family history of birth defects, and a singleton pregnancy.If this woman also experienced infection, taking medicine, tobacco exposure, and pesticide exposure, she would accumulate 64, 62, 36, and 100 points, respectively.Therefore, the total score would be 64 + 62 + 36 + 100 = 262.According to Fig. 3, the predicted risk of CHD in her offspring would be 0.079 (79‰).

Validation of Nomogram
The validation of the predictive model used both calibration and discrimination methods.For this process, ROC curves for the predicted probability were constructed for both the validation and training groups, and the AUC values were calculated.Using the ROC curve, the AUC values for the nomogram, which included 8 independent predictive risk factors, were determined to be 0.714 (95% CI: 0.630, 0.798) for the validation group and 0.716 (95% CI: 0.671, 0.760) for the training group, as shown in model might possess moderate discrimination ability.The optimal threshold for the model was identified as 0.0076, at which the model achieved a sensitivity of 59.29%, a specificity of 75.39%, and an accuracy of 75.28%.Furthermore, the chi-square of Hosmer-Lemeshow test was calculated to be 1.529 with a p-value of 0.910, indicating good calibration of the predictive model (Fig. 5).

Discussion
This study developed an innovative predictive model for assessing the risk of CHD, contrasting with the predominant research emphasis on its etiology.A total of 29,204 pregnant women from Shaanxi province of China were included in this study, representing the largest cohort for the development of the nomogram to predict the risk of CHD.Finally, we identified eight predictive risk factors of CHD, including the gravidity, preterm birth history, family history of birth defects, infection, taking medicine, tobacco exposure, pesticide exposure and singleton/twin pregnancy.CHD is the most prevalent congenital malformation, accounting for approximately one-third of all congenital anomalies [21].Between 1970 and 2017, the global birth prevalence of CHD has steadily increased by 10% every five years, peaking at 9.41 per 1000 births [3].The prevalence of CHD in Asia was higher compared to Europe and America [3].Given China's large population, CHD Fig. 2 Nomogram for predicting CHD Fig. 1 The importance ranking of factors related to CHD, including gravidity, preterm birth history, family history of birth defects, infection, taking medicine, tobacco exposure, pesticide exposure and singleton/twin pregnancy.The larger the mean decrease Gini, the more important the indicator was has emerged as a significant maternal and child health concern.Screening for risk factors and assessing risk levels are crucial for the population-based primary prevention of CHD [22,23].
Several predictive models for CHD have been developed in previous research.Yun Liang et al. constructed a logistic regression model using four key predictive factors-respiratory infections, polluted water exposure, adverse emotions during pregnancy, and nutrient deficiencies-to predict the risk of CHD in the offspring of pregnant women.The model achieved an AUC of 0.72 (95% CI: 0.681, 0.759) [20].In another study, Huixia Li et al. identified 15 predictive risk factors of CHD using univariate logistic regression analyses and developed a prediction model based on a feed-forward back-propagation neural network (BPNN).Their model demonstrated an AUC of 0.87 [19].However, these models were derived from small sample case-control studies, which may limit their generalizability to real-world populations.In contrast, our CHD prediction model was based on a large-scale CHD survey, resulting in a robust nomogram designed as a predictive application tool for CHD.
Our study identified significant associations between twin pregnancy, family history of birth defects, preterm birth history, and gravidity with an increased risk of CHD.Epidemiological evidence has consistently reported associations between pregnancy history and twin pregnancies with CHD risk.Yu et al. conducted a systematic review   and meta-analysis, revealing a summary OR of 1.13 (95% CI: 1.08, 1.18) for each additional pregnancy in relation to the risk of CHD [24].They also found that a family history of birth defects increased the risk of CHD by 314% (OR:4.14, 95% CI: 2.47, 6.96) [25].Moreover, populationbased data from the Northern Congenital Abnormality Survey in England between 1998 and 2010 indicated that twins have a 73% higher likelihood of CHD compared to singletons (RR: 1.73, 95% CI: 1.48, 2.04) [26].Our findings were consistent with these previous investigations, underscoring the significance of these predictive factors in the risk assessment of CHD.
Our study revealed that the risk of CHD was linked to various diseases, adverse health behaviors, and environmental factors, including infection, medication use, tobacco consumption, and pesticide exposure.These findings were consistent with existing literature.For instance, a case-control study demonstrated that the increased risk of CHD were associated with upper respiratory tract infections (OR: 3.40, 95% CI: 2.05, 5.62) and influenza (OR: 2.39, 95% CI: 1.47, 3.88) during early pregnancy [27].Other studies have reported increased risk of CHD in the offspring related to the use of antidepressants and antiepileptics during pregnancy [28,29].According to a population-based matched case-control study involving 9,452 subjects, maternal smoking during the first trimester was significantly associated with an increased likelihood of CHD in infants (OR: 1.44, 95% CI: 1.25, 1.66) in a dose-response manner [30].Moreover, a multisite case-control study indicated that exposure to fungicides, insecticides, and herbicides might elevate the risk of specific CHD subtypes, such as secundum atrial septal defect, hypoplastic left heart syndrome, and tetralogy of Fallot [31].
The predictive model serves as a valuable tool for obstetricians in assessing the risk of CHD in the fetus during early pregnancy.In cases where elevated risk is identified, it is advisable for pregnant women to consult with a fetal cardiologist and undergo fetal heart ultrasound examination during mid-pregnancy.Furthermore, pregnant women should receive information and education regarding birth defects, along with guidance on reducing the risk of CHD through lifestyle adjustments, including the supplementation of folic acid.
Despite the progress achieved in this project, several limitations should be acknowledged.Firstly, all the data of this study came from a survey, potentially introducing recall bias into our results.To mitigate this bias, we implemented a stringent investigation protocol and selected clear indices related to exposure, thereby aiding participants in the accurate recalling of long-term exposure histories.For instance, the indices including detailed questionnaires covering common diseases, types of medications, and their specific names during the periconceptional period were adopted in this study.Additionally, we cross-checked responses against any available paper medical records to minimize the impact of recall bias.Secondly, our study may have overlooked some predictive risk factors that could influence the value of C index.Factors such as maternal obesity (prepregnancy and early pregnancy), pregestational diabetes mellitus, gestational diabetes, pre-existing hypertension, and genes related to folate metabolism have been reported to be associated with the risk of CHD in previous studies.These variables were not included in our current model, suggesting that future research could incorporate additional prognostic factors to improve the predictive capability of our model.Finally, our prediction model was developed based on data from the Northwest Chinese population.Therefore, caution should be exercised when extrapolating these findings to other populations or geographic regions, as demographic and environmental factors can vary significantly.

Conclusion
In conclusion, we have developed an individualized nomogram to predict the risk of CHD in the offspring of pregnant women based on identified prenatal predictive risk factors.It would be a potential tool for the early gestational assessment of the risk of CHD during early pregnancy and for the delivery of targeted health education.To validate the efficacy and applicability of our nomogram, additional prospective birth cohort studies are warranted in future.

Fig. 3
Fig. 3 Example prediction nomogram for risk of CHD.

Fig. 5
Fig. 5 Calibration plot.The x-axis represents quintiles of predicted risk, and the y-axis reveals predicted and actual probability of CHD

Table 3 ;
Fig. 4.These AUC values indicated that the nomogram prediction

Table 1
Basis characteristics of training group and validation group

Table 2
Univariate and multivariate logistic analysis of factors predicting CHD in the training group

Table 3
The AUCs of the ROC curves for the nomogram and variables from the logistic regression model in the training group and validation group